Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove multiqc versions channel mix #1966

Merged
merged 2 commits into from
Oct 25, 2022
Merged

Remove multiqc versions channel mix #1966

merged 2 commits into from
Oct 25, 2022

Conversation

mahesh-panchal
Copy link
Member

@mahesh-panchal mahesh-panchal commented Oct 24, 2022

  • Removes mixing MultiQC version into channel that is already closed and not used any more.
  • Updates unique operator to filter on version file contents.

Fixes #1962

PR checklist

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md is updated
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

@mahesh-panchal mahesh-panchal linked an issue Oct 24, 2022 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Oct 24, 2022

Codecov Report

Merging #1966 (38ed718) into dev (263122a) will decrease coverage by 0.41%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##              dev    #1966      +/-   ##
==========================================
- Coverage   61.14%   60.72%   -0.42%     
==========================================
  Files          37       37              
  Lines        4676     4637      -39     
==========================================
- Hits         2859     2816      -43     
- Misses       1817     1821       +4     
Impacted Files Coverage Δ
nf_core/__main__.py 50.20% <0.00%> (-3.20%) ⬇️
nf_core/modules/test_yml_builder.py 48.41% <0.00%> (-1.36%) ⬇️
nf_core/sync.py 75.10% <0.00%> (-0.43%) ⬇️
nf_core/modules/modules_repo.py 73.58% <0.00%> (-0.37%) ⬇️
nf_core/modules/modules_differ.py 84.26% <0.00%> (-0.16%) ⬇️
nf_core/modules/lint/__init__.py 82.97% <0.00%> (-0.10%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@mashehu
Copy link
Contributor

mashehu commented Oct 25, 2022

Should we report the multiqc version earlier in the script then, so it is still recorded?

@mahesh-panchal
Copy link
Member Author

Is it not already? I thought this was already taken care of. Let me check.

@mahesh-panchal
Copy link
Member Author

So technically MultiQC reports it's own version, although not in the same place as the rest of the tools. I'm inclined to leave it this way unless there is strong opposition to this.

This seems one of those chicken and egg scenarios, if we try to do this with the versions.yml. We can modify the MQC code to separate the incoming versions file from the logs, swap the cat and multiqc command in the script, and use the cat to append the versions into the MQC ready versions.yml while the incoming versions file is renamed to a file that isn't going to be auto loaded by MQC.

@mashehu
Copy link
Contributor

mashehu commented Oct 25, 2022

yeah, wasn't sure how easy it is. The multiqc version on top should probably be enough.

@mahesh-panchal mahesh-panchal merged commit 81af6b7 into nf-core:dev Oct 25, 2022
@@ -82,7 +82,7 @@ workflow {{ short_name|upper }} {
ch_versions = ch_versions.mix(FASTQC.out.versions.first())

CUSTOM_DUMPSOFTWAREVERSIONS (
ch_versions.unique().collectFile(name: 'collated_versions.yml')
ch_versions.unique{ it.text }.collectFile(name: 'collated_versions.yml')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change appears to have introduced a bug, a lot of people's pipelines are hanging after the latest pipeline merge. See discussion in Slack here.

I guess that it was to make it more similar to other instances such as:

versions = versions.mix(BISMARK.out.versions.unique{ it.baseName })

However, we're not in a mix() here and it seems to silently hang forever without any errors. I think we need to revert this change unless anyone has any objections? I'll make an issue..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why the template CI checks didn't pick up on this. I found it because the methylseq CI failed due to timeouts. I would have expected the tools CI tests to do the same 🤔

Copy link
Member Author

@mahesh-panchal mahesh-panchal Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mixed channel (a few lines above). It's surprising that checking the text for uniqueness is causing a hang. There would appear to be a deeper cause, but I'm on leave now, so I don't have time to dig into it. It's definitely strange though. If it fixes the issue then revert it, but need to make sure duplicate versions.yml are not kept. It's mostly to do with duplicates emitted from subworkflows, since you can't use first().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, perhaps check if the dumpsoftware versions script is also removing duplicate entries by loading them into a dictionary. Then the unique is not needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, my mistake on it's need. It's for the case when a subworkflow is used in multiple places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ewels OK. I figured out why unique{ it.text } is causing a hang.
People are doing this in their workflows:

ch_versions.mix(TOOL.out.versions.first().ifEmpty(null))

specifically, people are putting null objects in their versions channels. They don't need to emit anything ...

The correct code should be

ch_versions.mix(TOOL.out.versions.first())

Mixing an empty channel is fine.
My guess is there are null pointer exceptions under the hood, resulting in the hang.

That's why this was never caught before being integrated into workflows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, nice detective work! Sorry to bother you whilst you're away.

Is there any functional difference between unique() and unique{ it.text }?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, perhaps check if the dumpsoftware versions script is also removing duplicate entries by loading them into a dictionary. Then the unique is not needed.

I have a feeling that even if the Python code is doing this, having duplicate keys within the YAML might make it invalid and crash the loading process? (not tested)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any functional difference between unique() and unique{ it.text }?

Yes, but I don't know how often the issue of duplication occurs. unique() is only checking for uniquely named files ( I think it must be checking the full path which includes the work directory ), which means here it's actually doing nothing, since the work directories are always unique. While most people use first() in their workflows, it doesn't help removing occurrences of modules used under different aliases for example from subworkflows. For example, versions from, WORKFLOW_A:SAMTOOLS and WORKFLOW_B:SAMTOOLS, should both be present in the versions channel, and since their work directories are unique, unique() won't do anything, but unique{ it.text } should remove the second occurrence.

@mahesh-panchal mahesh-panchal deleted the remove_multiqc_versions_channel_mix branch September 19, 2023 09:01
@lmReef lmReef mentioned this pull request Jan 21, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

multiqc is redundant
3 participants